Using syntactic information in handling natural language queries forextended boolean retrieval model
نویسندگان
چکیده
There are considerable evidences that trained users can achieve a good search eeectiveness through structured boolean queries rather than simple keyword queries because boolean operators can help to make more accurate representations of users' information search needs. However, it is not normally easy for ordinary users to construct eeective boolean queries using appropriate boolean operators. In this paper , we propose a syntax-based technique for handling natural language queris and phrases for extended boolean retrieval model in order to pursue both search eeectiveness and user convenience. First, natural language queries are syntactically analyzed using Korean natural language parser and the resulting syntactic trees are structurally simpliied using tree-simplifying mechanism in order to catch the logical relationships between keywords. Secondly , in a simpliied tree, plausible noun phrases are identiied and added into the tree as new additional keywords for more precise retrieval. Finally, the tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an n-best tree method which uses top n syntactic trees to compensate for detrimental eeects of a single incorrect top syntactic tree. In the experiments using KTSET2.0 (Korean standard document set), we showed that the proposed method outperformed natural language models without any syntactic analysis by 23% and, surprisingly enough, outperformed even manually constructed boolean queries by 8% in the 11-point average precision measures.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملFreestyle vs. Boolean: A Comparison of Partial and Exact Match Retrieval Systems
-Although Boolean searching has been the standard model for commercial information retrieval systems for the past three decades, natural language input and partial-match weighted retrieval have recently emerged from the laboratories to become a searching option in several well-known online systems. The purpose of this investigation is to compare the performance of one of these partial match opt...
متن کاملNatural Language Processing and XML Retrieval
XML information retrieval (XML-IR) systems respond to user queries with results more specific than documents. XML-IR queries contain both content and structural requirements traditionally expressed in a formal language. However, an intuitive alternative is natural language queries (NLQs). Here, we discuss three approaches for handling NLQs in an XMLIR system that are comparable to, and even out...
متن کاملPublic Transport Ontology for Passenger Information Retrieval
Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999